Faster Average Case Low Memory Semi-External Construction of the Burrows-Wheeler Transform
نویسنده
چکیده
The Burrows Wheeler transform has applications in data compression as well as full text indexing. Despite its important applications and various existing algorithmic approaches the construction of the transform for large data sets is still challenging. In this paper we present a new semi external memory algorithm capable of constructing the transform in time O(n log2 log n) on average if su cient internal memory is available to hold a fixed fraction of the input text. In the worst case the run-time is O(n log n log log n).
منابع مشابه
Average Linear Time and Compressed Space Construction of the Burrows-Wheeler Transform
The Burrows-Wheeler Transform is a text permutation that has revolutionized the fields of pattern matching and text compression, bridging the gap existing between the two. In this paper we approach the BWT-construction problem generalizing a well-known algorithm—based on backward search and dynamic strings manipulation—to work in a context-wise fashion, using automata on words. Let n, σ, and Hk...
متن کاملCompressed-Domain Pattern Matching with the Burrows-Wheeler Transform
This report investigates two approaches for online pattern-matching in files compressed with the Burrows-Wheeler transform (Burrows & Wheeler 1994). The first is based on the Boyer-Moore pattern matching algorithm (Boyer & Moore 1977), and the second is based on binary search. The new methods use the special structure of the BurrowsWheeler transform to achieve efficient, robust pattern matching...
متن کاملComputing the BWT and LCP array of a Set of Strings in External Memory
Indexing very large collections of strings, such as those produced by the widespread next generation sequencing technologies, heavily relies on multi-string generalization of the Burrows-Wheeler Transform (BWT): recent developments in this field have resulted in external memory algorithms, motivated by the large requirements of in-
متن کاملLow Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array
The longest common prefix (LCP) array is a versatile auxiliary data structure in indexed string matching. It can be used to speed up searching using the suffix array (SA) and provides an implicit representation of the topology of an underlying suffix tree. The LCP array of a string of length n can be represented as an array of length n words, or, in the presence of the SA, as a bit vector of 2n...
متن کاملOn the number of elements to reorder when updating a suffix array
Recently new algorithms appeared for updating the Burrows-Wheeler transform or the suffix array, when the text they index is modified. These algorithms proceed by reordering entries and the number of such reordered entries may be as high as the length of the text. However, in practice, these algorithms are faster for updating the Burrows-Wheeler transform or the suffix array than the fastest re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014